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Over the next seven years the Washington Pre-College (WPC) statewide 
testing program plans to replace the standard battery in use since 1964 with 
an entirely new array of predictor and placement tests. One intent of the 
development and validation of new tests is to give the Program complete con- 
trol over test materials, but there are other compelling reasons. WPC 
predictor variables now include age, sex, six high school (HS) GPA f s and 12 
tests: Vocabulary, English Usage, Spelling, Reading Speed, Reading Compre- 

hens ion, Data Sufficiency, Quantitative Judgment, Functional Relationships, 
Applied Mathematics, Mathematics Achievement, Spatial Ability, and Mechanical 
Reasoning. Although these tests were constructed for the Program in 1962-63 
by Educational Testing Service, they represent substitutes for the initial 
battery of tests selected according to the differential model in 1954 
(Lunneborg, 1966)* As a result predictors are employed which have been 
established as useful for discriminating solely among academic courses at one 
university (Uhiversity of Washington). This means that the current Program 
is based upon the decision-making needs of a select group in the middle to 
upper range of intellectual potential choosing among University curricula, 
e.g., to major in psychology vs. oceanography. Although the Program's success 
rests on the established ability of these predictors to work throughout the 
state, in other schools, and for other course areas (Cory, 1968; lunneborg, 
1966; lunneborg & lunneborg, 1969)> the desired differential aspect of pre- 
diction for these new and always expanding course criterion areas is question- 
able. The current battery is thus not meeting the needs of the bulk of high 
school seniors, i.e«, those who must choose a m o n g vocational-technical programs 
or between these and an academic curriculum, students spanning the range of 
intellectual aptitude, students starting higher education in the community 
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college setting. Thus, new test predictors must be developed to match the 
complexity of the differences among educational experiences open to high school 

seniors in the 1970* s. 

The solution to these problems rests with the construction and evaluation 
of new tests. Test predictors of student success in a host of college academic 
courses and an equally large number of vocational/technical courses must be 
identified and selected for all post-high school institutions in the state. 

It is also essential that the tests developed are capable of meeting the needs 
of both academic and vocational institutions for placement and classification 
of students. In order to provide for the m axi mum articulation of the guidance 
process for individuals, a given school must be able to use test scores to 
place freshmen in its various levels of introductory English, to determine the 
eligibility of prospective vocational, trainees for sheet metal, welding, sales, 
etc., and to give advanced credit. Thus, while continuing to place greatest 
stress upon its unique differential prediction character, the Program would 
like to provide its participants with two separate kinds of output— prediction 
and placement data. Prediction testing would be done in spring of the HS 
junior year and placement testing in spring of the HS senior year. Although 
all students would complete the same predictive battery, placement testing 
would be tailored to student’s background and his choice of college and 
intended area of study. 

This plan for evolving a comprehensive guidance program for higi school 
graduates naturally will not stop with the construction and selection of 
differential predictors. Once tests have been selected, normative and valida- 
tional data must be collected at all institutions to provide students and 
counselors with interpretive materials. 
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Testing with HS juniors spring 1969 was accomplished with the following 
goals in this long-range test development plan in mind: 

(1) Item analyses of the eleven current V7PC (Porn B) tests (Reading 
Speed excluded) on the basis of which short forms of these 11 tests would he 

devised for future use. 

(2) Evaluation of tests which might replace existing battery elements, 

i.e. , Spelling and Mathematics Proficiency. 

( 5 ) Evaluation of tests which might extend the existing battery, i.e., 
inasmuch as the WPC program intends that the new battery contain the factor 
complexity of the GATB, certain GATB tests were given together with experi- 
mental materials designed to measure the same things. 

Experimental test materials were divided into nine sets which were 
administered to nine student samples roughly equal in terms of higfc school 
sice. The nine sets of experimental tests were: 

(1) GATB Forming Metal Objects (FMQ) and Alpha Numeric Assembly A (ANA) 

(2) FMQ and Alpha Numeric Assembly B (ANB) 

( 3 ) GATB Name Matching (NM) and Perceptual Accuracy A (PAA) 

(k) NM and Perceptual Accuracy B (PAB) 

(5) Vocational Interest Inventory (VII ) 

( 6 ) GATB Mark Making (MM) and Motor Coordination (MC) 

(7) Spelling (Spe) 

( 8 ) Mathematics Proficiency A (MPA) 

(9) Mathematics Proficiency B (MFB) 

As Table 1 indicates by actual numbers of students receiving each 

experimental best were considerably greater than the numbers of cases used for 
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correlations with the standard battery. This resulted from the unavailability 
of WPC test results for students tested in the eastern half of the state and 
processed at Washington State University. Thus* although the item analyses 
of the experimental tests are based on statewide samples, both the item 
analyses of the standard WPC tests and the correlation of WPC tests with 
experimental tests are restricted to western Washington HS juniors. Approxi- 
mately 19,000 Ss were tested in the west; 8,000 in the east. 

s 

Item analyses of current WPC tests . These analyses were done using 
successive groups of 2,500 cases from the western sample. The focus of each 
was a recommendation for the short form of that test. 

(1) WPC English Usage (Section A, 60 items in 30 minutes; Section B, 30 items 

in 20 min. ) 

Mean score was 39*4, SD 15.0, KR-21 reliability .93 for 25 15 cases. Two 
items should be excluded from further testing (numbers 3 6 and 80) as they had 
point biserial correlations of -.01 and .03 with the tota^L score due appar- 
ently to poor construction. A half-length test of English Usage can be formed 
by dropping from Section A items 3, 4, 5, 7, 12 , 15 > l8 t 18, 21, 23 , 2 4, 

25, 29, 30, 33, 34, 36, 38, 39, 40, 43, 45, 46, 49, 50, 52, 55, 58, and 60 
and from Section B items 63, 65, 86, 69, TO, 71, 74, 7^, 78, 79, 80, 81, 85, 
89, and 90. The excluded items were the least discriminating items. 

(2) WPC Spelling (50 items, 10 minutes) 

Mean score was 15.4, SD 8.6, KR-21 reliability .87 for 2515 cases. One 
item, 23, although keyed correctly had a point biserial r of -.07 with total 
score (too difficult) and should be excluded from further testing. This test 
could be shortened to a ^0-item, 8-minute test by dropping items 2, 7, 13, 19, 
23, 26, 31, 40, 44, and 50, the poorest items of each successive four. 










with negative point biserial r's should have their keying reversed: correct 

keying is E for item 17 and C for item 18. Shortening Reading Comprehension 
is a bit difficult to predict timing for. However, if item 5 is removed 
from the first reading selection, and 7 and 8 from the second, l£ from the 
thir d, 26 from the fourth, 53 from the sixth and the seventh selection is 
dropped entirely, a 15-minute time limit would seem reasonable. 

(4) Iienhanical Reasoning (55 items, 25 minutes) 

Mean was 7-3, SD 7.2, KR-21 reliability .88 for 2436 cases. Two items, 
21 and 33, have been miskeyed. Reducing to a 20-item, 15-minute test can 
be accomplished by dropping items 1, 4, 7, 8, 12, 13, 16, 17, 21, 22, 25, 
28, 29, 33, and 35- 

(5) WPC Spatial Ability (24 items, 15 minutes) 

Mean was 9-3, SD 4.8, KR-21 reliability .85 for 2436 cases. Item 22 
dispite correct keying is too difficult because of poor construction 
(r = -.01) and should be dropped. This test could be reduced to a 16-item, 
10- minut e test by dropping items 2, 3, 7, 10, 17, 18, 22 and 24. 

(6) WPC Applied Mathematics (30 items, 20 minutes) 

Mean was 9.6, SD 4.8, KR-21 reliability of .7^ for 2436 cases with no 
particular item problems. Applied Math could be shortened to a 20-item, 
13-minute test by dropping items 2, 3, 10, H» 14, 15 , 23, 24, 28 and 30. 
Re lat ively Ion reliability makes this a low priority change* 



(7) WPC Vocabulary (100 items, 25 minutes) 

Mean was 41-7, SD 17-2, KR-21 reliability .95 for 2500 cases. Item 47 
thougi correctly keyed is too difficult (point biserial r * .01) to be re- 
tained. To reduce to a 60-item, 15 -minute test* the following items should 
be eliminated: 4, 5, 7, 8, 11, 15, 17, 20, 22, 24, 28, 29, 54, 55, 36, 59, 

45, 45, 46, 47, 51, 54, 56, 57, 65, 64, 67, 68, 71, 75, 77, 79, 81, 82, 87, 

90, 92, 94 , 99, 100. (Picking best three out of sequences of five items. ) 

(8) WPC Quantitative Skills Part A, Data Sufficienc y (15 items, 10 min. ) 

Mean of 6.0, SD 5.1, KR-21 reliability .74 for 2412 cases. It is not 

proposed that this test be shortened because of its already short time limit 
and l imit ed number of items, all of which were good. 

(9) WPC Quantitative Skills Part B, Quan titative Ju dgment (50 items, 10 min. ) 
Mean of 12. 7, SD 6.5, KR-21 reliability .89 for 2412 cases. Good relia- 
bility for a short time test, and all items were good. It is not proposed 
that Quantitative Judgment be shortened. 

(10) WPC Quantitative Skills Part C, Functional Relationships 
(15 items, 10 min. ) 

Mean of 5.0, SD 5.0, KR-21 reliability .67 for 2412 cases. Because of 
this relatively low reliability and because this test is not weighted for any 
predictions, it is proposed that the Quantitative Skills test be shortened by 
dropping this Part entirely. 

(11) WPC Mathematics Achievement (45 items, 60 minutes) 

Mean of 14-7, SD 8.6, KR-21 reliability .90 for 2412 cases. Shortening the 
test to a 50- item test, it is proposed to drop items 1, 2, 6, 10, 15, 18, 20, 

22, 25, 29, 55, 56, 59, 41 and 44. A new test should be constructed out of the 
remaining 50 items plus the 20 experimental Math Proficiency items (described 
in next section) ordered by difficulty. Separate scorings of the J, low level” 
and "high level" parts should be provided. The combined tests should take one 
hour testing time. 
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A sunnary of shortened versions of current WPC tests: 



1 . 


Booklet I 


Pull Length 
Items Minutes 


Shortened 
Items Minutes 


Savings 




English Usage, A 


60 


30 


30 


15 






English Usage, B 


30 


20 


15 


10 






Spelling 


50 


10 


40 


8 






Reading 


4o 


25 


29 


15 






Total 




85 




48 


55 min. 


2 . 


Booklet 11 














Mech Reason 


35 


25 


20 


15 






Spatial Abil 


24 


15 


16 


10 






Applied Hath 


30 


20 


20 


13 




; 


Total 




60 




38 


20 min. 


3* 


Booklet III 














Vocabulary 


100 


25 


60 


15 






Quant Skills 


60 


30 


45 


20 






Hath Achiev 


45 


60 


50** 


60 






Total 




Ilf 




95 


20 min. 



#* including 20 Math Proficiency items. 
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Evaluation of replacement tests * Two tests were constructed locally as 
potential replacements for WPC Spelling and Mathematics Achievement. The 
latter was written by a WPC Research subcommittee. 

(1) Spelling Test (50 items, 15 minutes) 

Mean score was 24.65, SD 7.92, KR-21 reliability was .84 based on 55^5 
cases. Time limit was judged appropriate from the mean, SD, and the fact 
that the distribution of scores was symmetric tailing off to zero frequencies 
for very low and very high scores. Item analysis identified 10 items with 
discriminations (point -biserial correlations with total score) less than .25. 
Because of the homogeneity and good distribution of item difficulties it is 
proposed that these 10 poor items (6, 8, 11, 25, 24, 26, 40, 41, 47> 50 ) be 
el iminat ed and the time limit shortened to 10 minutes. As Table 1 indicates 
the experimental Spelling test correlated . 70 with WPC Spelling. Alternate 
forms of this experimental, test were found earlier to correlate .61-. 85 with 
WPC Spelling (Gibson, 1969)* 

(2) Mathematics Proficiency (25 items, 20 minutes) 

Mean score was 16.05 on Form A, 15.48 on Form B, SD of 5*3 on both forms. 
As expected the test was easy and had a KR-21 reliability of .86 (N « 2502 for 
A, N * I5OO for B). It is proposed that two poorly discriminating items 
(Form A 5 and 8) and three relatively difficult items, i.e», with proportion 
passing less than .46, (Form B 5, 5 , and 9) be eliminated and the resulting 
20 items pooled with a shortened form of WPC Mathematics Achievement Test with 
1 hour total testing time. From Table 1 the two forms correlated .81 and .77 
with WPC Math Achievement, and .74 and .71 with WPC Applied Mathematics. 



Correlations of WFC Experimental Tests Given Spring 1969 'with Standard WPC Battery 
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Evaluation of new tests not represented in old battery * 

(1) GATB Mark Making was given solely to compare it with a locally-constructed 
machine- scorable test designed to measure the same factor, Motor Coordination. 
GA1B MM is highly-speeded and scored by counting the number of correctly made 
marks produced in one minutes. For a sample of 2,446 the mean score was 
72.8, SD was 9*1* 

(2) Motor Coordination (l60 items, lg minutes) 

Mean score for 1,135 Ss was 54*6, SD 12. 5. Unfortunately, as Table 2 
shows, this test correlated only .38 with GATB MM and new variations of the 
same theme must be devised and administered again with GATB MM until a suitable 
machine- scorable equivalent is found. 

(3) GATB Name Matching consisted of 150 items taken in 6 minutes and had a 
mean of 53*3 and SD of 12. 7. Among the WPC verbal components with which it 
was compared were Spelling (r = .39), English Usage (r = .32), Reading 
Comprehension (r — .27), and Vocabulary (r = *23). These correlations sub- 
stantiate the belief that NM measures something not in the current WPC battery. 

(4) Perceptual Accuracy Test (48 items, 10 minutes) 

Means were 8.17 Form A, 9*^4 Form B; SD 3*10 for Form A, SD 3*32 for 
Form B. The number of cases used for the item analyses were 1,279 tor Form A, 
2>462 for Form B. Testing time was too limited which resulted in inadequate 
item analysis results for 22 of the 48 items, i.e., 22 items were omitted by 
20f» or more students in both samples. The highest score was 21 or 22. It is 
proposed that the 6 least discriminating and most difficult items from among 
the remaining 2 6 (with good item analysis data) be eliminated producing a 
20- item pool of analyzed items. It is further proposed that the two most 
difficult items of the 22 imperfectly analyzed items be dropped to produce a 



second pool of unanalysed items. From each of these pools 10 items would he 
combined with 10 from the other pool producing two new 20-item eaq?erimental 
forms of Perceptual Accuracy each requiring 10 minutes testing time- Unanalysed 
items would in each form precede analysed items. As with GATB Name Matching 
the Perceptual Accuracy measures had low correlations (less than .50) with WFC 
verbal components (Table l). The largest correlations with current WPC tests 
were, interestingly enough, with Applied Mathematics (PAA r = .55, PAB r = .53) 
and Mathematics Achievement (PAA r « .34, PAB r = .30). Perceptual Accuracy, 
however, correlated only .33 (PAA) and .36 (PAB) with GATB Name Matching (Table 
2). It is hoped that more realistic timing of the shortened version of PA will 
provide a useful measure. 

(5 ) GATB Forming Metal Objects was given to be compared with WPC Spatial 
Ability, another three-dimensional spooe ability test, and with the 
two-dimensional, experimental Alpha Numeric Assembly test. FMO consists of 40 
items with a 6-minute time limit and yielded a mean score of 21. 3 ^ SD of 5*9> 
and KR-21 reliability of .82 based on 2,653 cases. Although IMO had a higher 
correlation with WPC Spatial Ability than with any other WPC measure, it was 
only .52. Thus, although bo rti measure three-dimensional spatial visualization, 
they possess considerable uniqueness. 

(6) Alpha Numeric Assembly Test (48 items, 10 minutes) 

Means were 9*91 Form A, 12. 85 Fovm B with SD 4.6 for both forms. For item 
analyses on Form A there were 1039 cases, for Form B 940 cases. Testing time 
proved too limited resulting in inadequate item analysis results for 13 of the 
48 items, i.e., these 13 were omitted by 20$ or more students in both samples. 

The highest score on the test was 29 or 30. It is proposed that the 5 least 
discriminating items among the 35 with completed item analyses (Form A 6 and 14, 
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Form B 6, 7, and 9) be eliminated and the remaining 30 items given In 15 
minutes time. The 13 items with inadequate item analysis data, Form A 15 
through 24, and Form B 22, 23 > end 24, should be assembled with the 17 most 
discriminating it^ms to form a second experimental form. 

Alpha Numeric Assembly correlated less than .30 with all WFC current 
elements. ANA and ANB correlated .45 and .46 respectively with FMO (Table 2) 
although only .22 and .22 with WPC Spatial Ability. Only by shortening testing 
time, can AlpJia Numeric be properly judged. 

(7) Vocational Interest Inventory or VII (1969 Rev. ) (112 items, 20 minutes) 

This test was constructed to provide ipsative scale scores ranging from 
0 to 28 on eight vocational interest groups as defined by Roe (1956). Roe's 
eight groups are: 

1 Service 

2 Business contact 

3 Organization 

4 Technology 

5 Outdoor 

6 Sciences 

7 General cultural 

8 Arts sind entertainment 

The first set of 56 items are occupations with socio-economic level 
controlled. Each interest group is presented 14 times with each group paired 
with every other group two times, thus a maximum of 14 points on any given 
group is possible from this section. The second set of 56 items consists of 
competing activities based on the 8 groups so that again each group is matched 
against each other group twice. 

The item analysis data on which the 1970 version of the VII was based 
came from high school juniors tested in May 1968 (609 males, 581 females). 

The data were analyzed separately for the sexes so that items were modified 
on the following bases: 
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(a) Bach item- scale correlation haul to be positive and higher than 
any other r between that item and any other scale, e.g., item 26 
used to read " (a) assistant caretaker in zoo” which was supposed 

to be most highly correlated with "science." For men it correlated 
with science .16, for women, .16. However, selecting "a" corre- 
lated .28 with "outdoors" among men, .30 among women. (On the 
other hand "b" which used to read "orderly" was clearly associated 
with "service," no problem.) But "a" had to be rewritten to get 
the outdoors out and science in at socio-economic level 6. The item 
now compares "diet kitchen helper" (science) with "nurse* s aide* 
(service). 

(b) In the above example "orderly" gave way to "nurse* s iade" on the 
basis of a need to keep both an item*s alternatives either neutral 
or stereotypically masculine or feminine so that choice would not 
be made on the basis of sex. By comparing the proportions of males 
and females endorsing alternatives, it was possible to reduce this 
confounding tendency, e.g., although item 2h was a good measure of 
technology using "bulldozer operator" (.35>*38)> the percent of 
men endorsing this activity was 68, the percent of women 26. 

Clearly, bulldozer operator had to be replaced by some job in 
technology at level 6 which was more feminine, something as neutral 
as its alternative "sorting machine operator." It is hoped that 
"automobile upholstery cutter" will do the job. 

It is proposed that the time limit be increased to 25 minutes (and the 
test scheduled at the end of the morning testing session) to permit students 
to finish the entire test. The VII is ready to be given all WFC participants. 
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When normative data are assembled summer 1970 score reports can be prepared 
and sent along with WFC test results for interpretation in the high school 
senior year. Results in standard score and ranking form will initially be 
attempted as the reporting format. 

A summing up of recommendations with respect to WFC experimental testing 
spring 19T0 includes: 

(1) The battery can be divided into three Booklets (see page 8) each of 
which has a long and a short form. Each testing center should administer one 
long-form and two short-form booklets to permit enough experimental testing 
time with no loss in predictive accuracy from the current battery. 

(2) The experimental tests to be used in 1970 should include revised 

versions of these tests given in 1969: 

Spelling (could be considered a direct replacement) 

GATB Mark Making (still needed in view of poor correlation for MC) 

Motor Coordination 

GATB Name Matching 

Pereentual Accuracy (2 versions) 

GATB Forming Metal Objects 
Alpha Numeric Assembly (2 versions) 

(5) Assume the following six sets of testing centers the 1970 experimental 
tests should be distributed as follows: 

Group A CWBC, Centralia, Edmonds, Seattle Community, Seattle Pacific, 
Whitworth. 

Group B Columbia Basin, Ft. Steilacoom, Shoreline, Tacoma Community, WWSC. 

Group C Everett, Grays Harbor, Green River, Highline, Yakima. 

Group D Lower Columbia, Seattle Univ, Skagit Valley, Spokane Community, 
Walla Walla and Whitman. 

Group E Bellevue, Clark, Olympic, WSU, Wenatchee. 

Group F EWSC, Big Bend, PUG, Peninsula, St. Martins, UW. 



IT 

Groups have been formed to provide approximately equal numbers of tested 
high school juniors (based upon 1969 testing) and geographical spread* 

( 4 ) Battery composition 

The following is suggested for each of the above groups: 

Group A: Long form Booklet I, short form Booklet II, short 

form Booklet III, Vocation Interest Inventory, 

Motor Coordination, Mark Making, Forming Metal Objects, 

Alpha Numeric Assembly (A). 

Group B: Long form Booklet I, short form Booklet II, short 

form Booklet III, Vocational Interest Inventory, 

Motor Coordination, Name Matching, Perceptual 
Accuracy (A). 

Group C: Short form Booklet I, long form Booklet II, short 

form Booklet III, Vocational Interest Inventory, 

Motor Coordination, Verbal Analogies (A)*, Forming 
Metal Objects, Perceptual Accuracy (B). 

Group D: Short form Booklet I, long form Booklet II, short 

form Booklet III, Vocational Interest Inventory, 

Motor Coordination, Verbal Analogies (B), Name 
Matching, Alpha Numeric Assembly (B). 

Group E: Short form Booklet I, short form Booklet II, long 

form Booklet III, Vocational Interest Inventory, 

Motor Coordination, Verbal Analogies (A), Spelling, 
Perceptual Accuracy (A). 

Group F: Short form Booklet I, short form Booklet II, long 

form Booklet III, Vocational Interest Inventory, 

Motor Coordination, Verbal Analogies (B) Spelling, 

Perceptual Accuracy (B). 

Verbal Analogies A and B are 15 -minute tests provided WFC by Lackland Air Force Base 
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